Here I have done the estimation of Alu enrichment in DHX9 samples compared to others using two different approaches.
A repeat sequence library was constructed containing all repeat classes in human genome matching the following criteria:
This led to a library of ~2000 repeat subfamilies. Library has cannonical seq and instance seperated by 100 “N” spacers. Each instance is given 20 bp overhang to allow junction reads to map.
Then I also create a complement library, containing the rest of the genome. And I mapped each sample to this library and also to it’s complement using BWA. I then count the read mapping to each repeat family by following criteria:
I map samples and their corresponding empty vector controls and count the reads. Then I calculate Max Likelihood of enrichment for each rep family on sample over vector, normalized by ratio of total mappable reads. Total mappable reads is determined by a seperate alignment by bowtie2.
I then cluster the samples by their MLE estimate and make heatmap to see which families are uniquely enriched in DHX9 vs others.
ALL
ALU ONLY
ALL
ALU ONLY
To see alu enrichment eve after avoiding any mapping at all, I cluster the repeats directly from the fasta files. For this, I first sampled 100K reads from each of these files, to make the number of nodes comparable for each sample in top clusters. Then pairwise alignment (BLAST) is performed with all the sampled reads, and the reads were clustered by overlaps with each other. Since repeat containing reads overlap more with each other, this method identifies the repeat clusters in the data.
After assembling the clusters, I run repeatmasker on reads in each cluster and detect the repeat families within. I then plot these clusters and annotate the three major Alu subfamilies within top clusters detected in each sample.
Following are the top clusters (cluster1) for each sample, annotated for Alu families (if present)
The colours may have to be changed (bgcolor=black).
Colors : orange= AluS, yellow=AluY, Green=AluJ,Brown=Others
Colors : orange= AluS, yellow=AluY, Green=AluJ,Purple=Others